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FOREWORD 


This  investigation  was  sponsored  by  Mr.  C.  C.  Stout,  NAVELEX, 
Code  330.  The  work  was  performed  by  the  investigator  at  the 
Naval  Postgraduate  School,  Monterey,  California. 

This  report  is  the  second  in  a  series  concerned  with  the 
possible  applications  of  voice  recognition  technology  in  command 
and  control  tasks.  The  first  report  was,  "Experiments  with 
Voice  Input  for  Command  and  Control:  Using  Voice  Input  to 
Operate  a  Distributed  Computer  Network,"  (Technical  Report 
NPS55-80-016) ,  by  Gary  K.  Poock,  April  1980. 


THE  EFFECTS  OF  CERTAIN  BACKGROUND  NOISES 
ON  THE  PERFORMANCE  OF  A  VOICE  RECOGNITION  SYSTEM 

I .  EXECUTIVE  SUMMARY 

In  this  experiment  the  performance  of  a  voice  recognit 4  on 
device  was  examined  as  a  function  of  background  noise  conditions 
A  subject  trained  the  recognizer  in  one  background  noise  condi¬ 
tion  and  used  it  in  three  background  noise  conditions. 

The  most  important  findings  were  that  if  the  voice  recog¬ 
nition  device  is  to  be  used  in  a  75dBA  conversational  noise 
environment,  then  training  the  system  in  a  65  or  75dBA  conver¬ 
sational  environment  will  yield  fewer  errors  than  when  it  is 
trained  in  a  38dBA  white  noise  environment;  while  if  one  trains 
'•i  a  38,  65,  or  75dBA,  performance  will  be  satisfactory  when 
used  in  38  or  65dBA  environments. 

IT .  INTRODUCTION 

A.  Problem  Voice  recognition  equipment  is  being  considered 
for  use  in  various  military  command  and  control  functions. 

The  effects,  if  any,  of  background  noises  upon  the  performance 
of  a  command  and  control  system  using  voice  recognition  equip¬ 
ment  0 re  largely  unknown.  Before  voice  recognition  equipment 
is  used  in  operational  command  and  control  systems,  the  re¬ 
lationships  between  system  performance  and  background  noise 
must  be  understood. 

R.  Object lve  The  objective  of  the  experiment  described  in 
this  report  was  to  determine  the  effect  of  background  noise, 


including  human  conversation,  on  the  performance  of  a  voice 
recognition  system. 

C.  Background  Technology  allowing  the  use  of  voice  input 
to  control  machines  has  recently  been  developed.  Although 
in  relative  infancy,  this  technology  has  yielded  equipment 
that  can  be  trained  to  recognize  a  set  of  utterances  from 
nearly  continuous  speech.  Applications  and  experiments  using 
voice  recognition  equipment  are  burgeoning.  Poock  (1980), 
for  instance,  reported  on  the  use  of  voice  input  to  operate 
a  distributed  computer  network.  Also  in  1980,  the  Department 
of  Defense  (DoD)  sponsored  a  conference  on  voice  interactive 
systems  (Voice  Interactive  Systems:  Applications  and  Payoffs, 
1980).  The  DoD  conference  featured  three  days  of  presentations 
covering  a  number  of  ways  in  which  voice  technology  can  be 
used  in  man-machine  systems.  A  presentation  at  the  DoD  con¬ 
ference  by  Thomas  G.  Drennen  discussed  the  effect  of  attack/ 
fighter  cockpit  noise  on  speech  characteristics  and  on  voice 
recognition  system  performance.  Drennen  reported  that  the 
voice  recognition  system  he  used  performed  more  accurately 
under  extremely  high  (106  or  ll4dB)  noise  levels  when  the 
training  had  been  under  similar  noise  levels  (ll^dB)  than 
when  the  training  had  been  done  at  low  (lOdB)  noise  level. ^ 

At  testing  levels  of  10  or  101dB,  however,  recognition  accu¬ 
racy  was  higher  if  the  training  had  been  done  in  a  10  rather 
than  a  ll^dB  environment. 
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Drennen's  noise  environment  represented  cockpit  conditions 
under  different  aircraft  power  settings.  In  many  command  and 
control  applications,  background  voice  messages  and  conversations 
are  present  and  might  influence  the  performance  of  vo^cc  recog¬ 
nition  devices.  It  is  important  to  determine  if  Drennen's 
findings  extend  to  environments  in  which  the  background  noise 
is  human  speech  and  to  less  extreme  dB  levels  of  background 
noise. 

Ill .  APPROACH 

A.  Experimental  Setting  The  experiment  was  conducted  in  a 
soundproof  chamber.  A  model  T600  Threshold  Technology,  Inc. 
voice  recognition  device  was  used  with  a  Shure  model  SM10  micro¬ 
phone.  With  added  memory  modules,  up  to  256  two-second  voice 
utterances  could  have  been  used.  In  this  experiment,  50  utter¬ 
ances  w«re  used.  A  maximum  utterance  length  of  two  seconds 

was  a  limitation  imposed  by  the  voice  recognition  device.  For 
more  details  on  the  operation  of  voice  recognition  equipment, 
see  Poock  (  1980)  . 

B.  Independent  Variables  Two  Independent  variables  were 
investigated  in  this  experiment;  first,  the  level  of  back¬ 
ground  noise  during  the  training  of  the  model  T600  voice  recog¬ 
nition  device;  second,  the  level  of  background  noise  during 

the  testing  of  the  voice  recognition  device.  The  training 
noise  lev«l  and  the  testing  noise  level  independent  variables 
hai  the  name  ’■hr ee  levels  of  noise:  ambient  noise  (an  average 
of  about  -jtdBA),  conversational  noise  at  an  average  of  65dBA, 
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and  conversational  noise  at  an  average  of  75dBA. 
the  65dBA  and  75dBA  average  noise  levels,  the  sound  levels 
varied  from  the  average  value  by  no  more  than  +7dBA.  Sound 
levels  were  measured  at  the  microphone  connected  to  the  voice 
recognition  device. 

The  levels  of  background  noise  were  measured  using  the 
dBA-weight ing  network.  The  A-weighting  network  is  very  good 
at  giving  a  quick  estimate  of  the  interference  of  noise  upon 
speech  (MIL-HDBK-759 ,  p.  358).  When  dBA  levels  of  90-95dBA 
and  greater  were  tried,  the  voice  recognizer  tended  to  emit 
a  nearly  continuous  string  of  extra  outputs  even  though  no 
one  was  speaking  to  it.  Therefore,  background  noises  of  that 
level  were  not  considered  for  use  in  this  experiment.  Speech 
interference  levels  (SIL)  are  often  used  to  estimate  maximum 
permissable  levels  of  background  noises  (Bragdon,  p.  79). 

The  SIL  can  be  determined  from  the  dBA-weighted  network  (Bragdon, 
p.  79).  Tables  are  available  (see,  for  instance,  the  Human 
Engineering  Guide  to  Equipment  Design,  p.  193)  demonstrating 
the  relationship  between  speech  level  (normal,  raised,  very 
loud,  and  shouting),  distance  between  talker  and  listener, 
and  level  of  background  noise  that  barely  permits  reliable 
conversation.  For  example,  for  reliable  conversation  when 
the  speaker  is  one  foot  from  the  listener,  the  background 
noise  should  not  exceed  75dBA.  A  background  noise  of  FBdBA 
or  less  should  permit  reliable  conversation  when  the  speaker 
and  listener  are  three  feet  apart.  Bragdon  (1*171,  p.  79) 
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reports  that  when  background  noise  approaches  80dBA,  hearing 
accuracy  declines.  Bragdon  (1971,  p.  80)  also  describes  a 
survey  which  found  that  71dBA  was  a  maximum  acceptable  level 
for  background  noise  for  voice  communications.  At  noise  levels 
greater  than  that,  people  reported  their  job  performance  was 
adversely  impacted.  Tn  conclusion,  the  three  levels  of  back¬ 
ground  sound  used  in  this  experiment  (38dBA,  6BdBA,  and  75dBA) 
should  have  covered  the  range  of  background  noise  intensities 
likely  to  be  found  in  many  command  and  control  environments. 

C.  Dependent  Variables.  Three  types  of  voice  recognition 
system  errors  were  recorded  and  added  together  to  form  the 
error  measure  used  in  the  analysis  of  results  of  the  experi¬ 
ment  . 

•  Wrong  outputs:  the  recognizer  gave  the 
wrong  response  to  the  subject's  utterance. 

•  "Beeps":  the  Model  T600  Threshold  Technology, 

Inc.  voice  recognition  device  emitted  an 
audible  beep  when  it  did  not  recognize  an 
utterance . 

•  Extra  outputs:  the  voice  recognition  device 
emitted  a  response  when  the  subject  had  not 
emitted  an  utterance.  These  outputs  could 
occur  when  the  microphone  was  open  either 
before  or  after  an  utterance. 

The  dependent.  variable  used  in  the  analysis  was  formed 
by  summing  together  the  number  of  errors  made  by  the  voice 
recognition  device  in  each  subject  x  test  condition  combination. 

D.  Experimental  Design.  This  was  a  two-factor  experiment 
with  repeated  measures  on  one  factor  (Winer,  p.  30?);  Subjects 
we  re  nr-::*  *■  1  wi'hin  one  factor.  Each  subject  trained  the  voice 
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recognition  device  under  one  of  the  noise  conditions  and  tested 
the  voice  recognition  device  under  each  of  the  three  noise  con¬ 
ditions.  Six  subjects  were  randomly  assigned  to  each  of  the 
three  training  conditions.  The  ordering  of  presentation  of 
the  test  conditions  was  done  such  that  each  test  condition 
appeared  an  equal  number  of  times  in  first,  second,  and  third 
place  for  each  training  condition.  Figure  1  portrays  the  de¬ 
sign  of  the  experiment . 

E.  Training  and  Testing.  Each  subject  trained  the  voice 
recognition  device  to  the  same  list  of  50  utterances.  (A  copy 
of  the  list  of  utterances  is  provided  in  Appendix  I). 

During  the  training  phase,  the  subject  would  repeat  each 
utterance  10  times.  Following  the  10  repetitions  of  an  utter¬ 
ance,  the  device  was  deemed  to  be  trained  if  the  utterance 
was  recognized  correctly  two  out  of  three  times.  Training 
with  an  utterance  continued  until  the  two-out-of  three  cri¬ 
terion  was  satisfied. 

During  the  testing  phase  of  the  experiment,  the  subject 
was  instructed  to  read  each  word  only  once  (under  each  test 
background  noise  condition).  An  error  was  counted  if  the  voice 
recognizer  emitted  the  wrong  output,  "beeped",  or  emitted  an 
output  when  the  subject  had  not  spoken  one  of  the  utterances. 

A  copy  of  the  instructions  given  to  the  subjects  is 
given  in  Appendix  TI.  The  instruction  sheet  also  includes 
prompts  to  be  followed  by  the  experimenter. 
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Figure  1 


CONCEPTUAL  DESIGN  OF  THE  EXPERIMENT 
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Each  subject  trained  the  device  in  one  dBA  level, 
but  tested  device  in  all  three  dBA  levels. 


EXAMPLE- 
Recognition  errors 
with  subJ#IB  when 
device  was  trained 
at  75  dBA  and  tested 
at  75  dBA 
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TV  RESULTS 


A.  Number  of  Errors.  Table  1  presents  the  number  of  errors, 
and  mean  number  of  errors  for  the  different  experimental  con¬ 
ditions.  (Appendix  TIT  presents  the  data  by  type  and  number 
of  errors,  by  subject.) 

B.  Analysis  of  Variance.  An  analysis  of  variance  was  made 
of  the  error  data  shown  in  Table  1.  Table  2  presents  the  re¬ 
sults  of  that  analysis. 

The  only  F-statistic  significant  in  Table  2  is  the  one 
for  test  noise  level.  Because  certain  assumptions  about  the 
subjects'  covariance  matrices  must  be  met  or  the  sampling 
distribution  of  the  F  statistic  will  not  be  the  F  distribution, 
a  conservative  test  was  also  applied  to  the  Test  Noise  Level 
variable,  Winer  (pp.  305-306)  describes  a  conservative  test 
developed  by  Greenhouse  and  Geisser,  For  that  test,  the  de¬ 
grees  of  freedom  to  be  used  in  this  experiment  for  the  critical 
value  of  the  F  statistic  for  the  Test  Noise  Level  are  (1,15). 
Using  those  degrees  of  freedom,  the  F  statistic  for  Test  Noise 
Level  is  still  statistically  significant  (p<  .01.), 

Scheffe’s  confidence  intervals  (Winer,  p.  85)  were  used 
to  make  a  posteriori  comparisons  among  the  three  testing  noise 
condition  means.  The  confidence  intervals  are  presented  in 
Table  3. 

The  results  in  Table  3  indicate  (because  zero  is  outside 
the  intervals)  that  the  number  of  errors  made  by  the  voice 
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Level,  Testing  Level,  and  by  Subject 
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TABLE  3 


Scheffe's  Confidence  Intervals  for  Differences  Between 
Pairs  of  Testing  Condition  Means 


Contrast 

Difference 
Between  the 
Sample  Means 

9 r>7o  Confidence  Interval 

U38"  U65 

1.72  -  1.00  -  .72 

C  {  -2.54  <  "^38  -  ^65  <  3.98]  =  .95 

U75-  y38 

7.83  -  1.72  =  6.11 

C  {2.85  <  ^75  -  ^38  <  9.37]  =  .95 

U65 

7.83  -  1.00  =  6.83 

C  [  3.57  <  ^75  -  <  10.09]  =  .95 
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recognition  device  under  the  75dBA  testing  noise  level  was 
signif icantly  greater  than  the  number  of  errors  made  under 
either  the  65dBA  or  the  38dBA  testing  levels.  The  confidence 
interval  in  Table  3  for  the  3$dBA  vs.  65dBA  contrast  shows 
(because  zero  is  inside  the  interval)  that  the  numbers  of 
errors  made  under  those  two  testing  conditions  were  not  sig¬ 
nificantly  different. 

Figure  2  provides  plots  of  the  average  number  of  errors 
made  by  the  recognition  device  for  each  test  noise  level  at 
each  of  the  training  noise  levels.  In  Figure  2,  ^he  noise 
levels  are  reported  in  terms  of  decibels,  while  in  Figure  3 
the  sound  pressure  levels  are  presented  in  terms  of  microbars. 
The  average  threshold  for  human  hearing  is  .0002  microbars 
which  equals  .0002  dynes/cmJ  or  10~6 watts/cm2  (Woodworth  and 
Schlosberg,  p.  325).  The  microbar  levels  were  found  by  solving 
equation  3.  for  P  when  SPL  (sound  pressure  level)  was  38  ,  65  , 
or  75dB,  and  Po=.0002. 

Equation  1  SPL  =  20  logi 0  & 

P  o 

The  lines  graphed  on  Figure  2  and  3  have  rather  different  ap¬ 
pearances  because  of  the  logarithmic  relationship  between  the 
decibel  scale  and  sound  pressures. 

Statistically  significant  (p_<.05)  differences  between 

pairs  of  training  x  testing  condition  mean  numbers  of  errors 

are  indicated  on  Figure  2.  Scheffe's  confidence  intervals 

were  used  with  a  =  .05  to  contrast  pairs  of  test  x  training 
3^5 

condition  means.  ’  ’ 
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Mean  Number  of  Errors  When  Tested 


Figure  2. 

MEAN  NUMBER  OF  RECOGNITION  DEVICE  ERRORS  BY 
TEST  AND  TRAINING  NOISE  LEVELS  INdBA. 

Significant  a  posteriori 
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Mean  Number  of  Errors  When  Tested 


Figure  3. 

MEAN  NUMBER  OF  RECOGNITION  DEVICE  ERRORS 
BY  TEST  AND  TRAINING  NOISE  LEVELS  IN 
MICROBARS. 

(.0002  MICROBARS  =  THRESHOLD  OF  HEARING) 
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Training. 
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The  results  In  Table  2  showed  that  only  the  F-test  for 
test  noise  level  was  statistically  significant  (p«  .01). 
Fcheffe’s  contrasts  (Table  1)  showed  that,  the  average  number 
of  recognition  errors  was  different  (p<  .00)  under  the  7cdbA 
testing  condition  fr-om  the  average  number  of  recognition  errors 
under  either  the  -{HdBA  or  65dBA  testing  condition.  These  sig¬ 
nificant  differences  are  shown  in  Figure  2.  Bcheffe's  con¬ 
trasts  were  also  used  to  contrast  pairs  of  test  means  within 
and  between  training  conditions.  None  of  the  contrasts  between 
pairs  of  means  from  different  training  and  different  noise 
1  *■" v 0 1  conditions  was  statistically  significant  («  -  .op.). 
Within  training  conditions,  the  only  pair  of  means  that  was 
significantly  different  (<*  =  .05)  was  within  the  training  at 
‘:f-dbA  condition:  The  means  from  testing  at  75dBA  and  65dBA 
(when  trained  at  -iBdBA)  were  significantly  different.  Ad¬ 
ditionally,  within  the  )BdBA  training  condition,  the  overall 
average  of  the  )8dPA  and  55dBA  average  numbers  of  errors  was 
significantly  less  than  the  average  number  of  errors  made 
under  the  7BdBA  testing  condition. 

Using  the  nom-  iif  lature  of  Table  i,  the  joint  mean  of 
■  l  is  1  and  J2  was  not  significantly  different  from  the  mean 
of  coll  21.  In  other  words,  the  average  of  the  two  high  points 
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not  differ  signlfi 
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t.UdBA  line. 

A  second  analysis  of  variance  was  conducted  using  a  slight 
different  dependent  variable  from  the  one  used  in  the  analyses 
reported  in  the  preceding  paragraph.  In  the  second  analysis 
of  variance,  the  type  of  error  labeled  extra  outputs  was  ex¬ 
cluded  from  the  data,  leaving  only  wrong  outputs  and  "beeps" 
in  the  dependent  variable.  This  was  done  because  different 
microphone  utilization  practices,  or  use  of  a  better  sound 
cancelling  microphone,  might  reduce  or  eliminate  extra  outputs. 
The  data  and  the  analysis  of  variance  table  for  this  dependent 
variable  excluding  extra  outputs  are  given  in  Appendix  TV. 
Suffice  it  to  say,  removing  the  extra  output  errors  did  not 
change  the  results  of  the  analysis  of  variance. 

V  DISCUSSION  AND  CONCLUSIONS 

The  results  from  the  experiment  reported  here  indicate 
that  only  the  noise  condition  during  testing  influenced  the 
number  of  errors  made  by  the  Model  T600  Threshold  Technology, 
Inc.  voice  recognition  device.  Unlike  the  results  obtained 
by  Drennen,  no  Interaction  was  found  between  testing  and 
training  background  noise  levels  and  number  of  errors  made 
by  the  voice  recognition  device.  It  should  be  noted  that 
the  sound  pressure  levels  used  in  this  experiment  (78,  65,  or 
75dBA)  did  not  approach  the  sound  Intensity  levels  used  by 
Drennen  (10,  101,  106,  or  llUdB).  Drennen  (reference  note  2) 
does  not  consider  the  results  of  this  experiment  to  be  in  con¬ 
flict  with  the  results  he  obtained,  because  he  believes  the 
interaction  between  testing  and  training  background  noises 
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will  not  be  evident  until  dB  levels  of  around  100  or  more 
are  used. 

The  results  of  this  experiment  indicate  that  care  must 
be  exercised  if  the  Model  T600  Threshold  Technology,  Inc. 
voice  recognition  device  and  Shure  SM10  microphone  are  used 
in  an  environment  with  an  average  conversational  background 
of  75dBA.  Overall,  (averaged  over  the  three  training  noise 
levels)  this  experiment  indicates  a  higher  error  in  a  75dBA 
background  noise  environment  than  in  either  the  38  or  65dBA 
levels.  However,  a  posteriori  tests  of  the  mean  numbers  of 
errors  showed  the  only  significant  difference  between  the 
75dBA  test  condition  line  and  the  other  two  lines  in  Figure 
2  was  at  the  38dBA  training  condition.  The  null  hypothesis 
of  no  difference  in  testing  performance  at  38,  65  or  75dBA 
cannot  be  rejected  if  the  device  is  trained  in  either  a  65, 
or  75dBA  environment.  In  brief,  the  results  from  this  ex¬ 
periment  indicate  that  if  the  Model  T600  Threshold  Technol¬ 
ogy,  Inc.  voice  recognition  device  and  Shure  SM10  microphone 
are  to  be  used  in  a  75dBA  conversational  background  noise 
environment,  then  training  in  a  65  or  75dBA  conversational 
noise  environment  will  yield  fewer  errors  than  will  training 
in  a  38dBA  white  noise  environment. 

There  was  no  significant  difference  between  the  average 
number  of  errors  made  in  the  65dBA  testing  condition  versus 
♦he  78dBA  testing  condition.  The  38  and  65dBA  lines  in  Figures 
7  and  '  represent  the  mean  number  of  errors  obtained  from  the 
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experiment,  but  the  difference  between  pairs  of  38  and  65dBA 
means  are  not  statistically  significant  -  -  despite  what  might 
be  concluded  from  casually  viewing  those  lines  in  Figures  2 
and  3 . 

VI  POSSIBLE  FUTURE  RESEARCH 


Many  other  possible  experiments  were  suggested  during  the 
course  of  the  experiment  described  in  this  report.  The  fol¬ 
lowing  are  suggestions  for  future  experiments. 

•  The  effects  of  more  extreme  dB  levels  of 
background  noise  on  performance  of  the 
speech  recognizer  should  be  determined. 

•  The  effects  of  background  sounds  that  in¬ 
clude  utterances  to  which  the  recognizer 
has  been  trained  should  be  examined. 

•  The  effects  of  different  kinds  of  back¬ 
ground  noises,  e.g..  Impact  sounds, should 
be  studied. 

•  The  effects  of  different  background  noise 
levels  when  different  noise  cancelling 
microphones  are  used,  and  the  effects  of 
different  adjustments  to  the  recognizer 
should  be  determined. 

•  The  effects  of  differences  among  users 
should  be  studied.  (It  was  noted  during 
this  experiment  that  one  subject  had 
difficulty  raising  his  voice  to  a  level 
comparable  to,  or  above  that  of,  the  75dBA 
background  noise.) 

•  The  effects  of  training  of  users  should 
be  ascertained.  Can  users  be  trained  to 
perform  in  ways  that  will  maintain  system 
performance  under  different  background 
noise  conditions? 
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Experiments  should  be  conducted  in  typical 
command  and  control  types  of  rooms,  com¬ 
partments,  etc.,  as  sound  reverberations 
in  such  locations  may  influence  the  per¬ 
formance  of  a  voice  recognition  system. 

(The  experiment  described  in  this  report 
was  conducted  in  a  soundproof  room,  which 
also  allowed  few  sound  reflections  within 
the  room . ) 

An  experiment  should  be  conducted  to  deter¬ 
mine  if  training  in  a  low  dBA  (e.g.,  38dBA) 
conversational  environment  (if  such  a  low 
dBA  conversational  environment  can  be  devel¬ 
oped)  has  the  same  effect  on  performance  of 
the  recognizer  as  does  training  in  a  low 
intensity  white  noise  environment. 


FOOTMOTFS 


^"Speech  recognition  devices  are  "trained"' to  recoin  1  z<=>  s'-lected 
utterances  made  by  d  person.  The  device  is  put.  in  a  Irnmln? 
mode,  and  the  person  repeats  the  particular  utterar.or  r,  number' 
of  times.  The  device  can  then  be  tested  to  determine  If  It 
recognizes  the  utterance. 

2 

‘"The  conversational  noises  were  recorded  as  about  twenty  people 
in  a  room  talked  informally.  They  were  unaware  they  were  being 
recorded.  For  purposes  of  the  experiment,  a  several  minute 
segment  of  the  original  recording  was  re-recorded  to  yield  a 
thirty  minute  length  tape.  The  result  of  this  process  was  a 
fairly  constant  hub-bub  of  voices,  with  recognizable  words, 
but  no  recognizable  conversations.  ^he  desired  level  war,  at¬ 
tained  by  adjusting  gain  on  an  amplifier. 

^Within  the  same  training  noise  level,  the  confidence  Interval 
for  contrasts  between  a  pair  of  mean  was: 


C  = 


[(IJ-1),  I(J-l)  (K 


X  2MS  [test  level  x  Subs  (Train.  Level)] 
K 


c  =  V 


8  x  2.27 


2 

xir  x 


14.48  =  9.35. 


The  confidence  interval  for  contrasting  pairs  of  means  from 
different  training  and  different  testing  conditions  was: 


C  = 


■V  (ij-i)  p.9s  (1,45))  *2  £ 


(J-l)  MS  (BC  (A))  ± 


C  = 


-I 


(9-1)  x  4.06  x  2 


>2,x  14.48  ±  13.87 


) 


10.11 


3x6 


FOOTNOTES 


cont  '  d 

The  degrees  of  freedom  for  the  denominator  of  the  F  statistic 
were  computed  (Reference  note  1.)  from: 


I(J-l) (K-l)  I (K-l) 

DFd  =44.9  -*45 


The  confidence  interval  for  contrasting  the  combined  average 
of  the  number  of  errors  in  cell  11  (see  Table  1)  and  cell  12 
with  the  average  number  of  errors  in  cell  13  was: 

c  =  (  (I J—  1 )  t  I ( J- 1)  (K-l )]J  X  MS[  BC (A) ]  [ +(~^~  + 
C  =  16.62 
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APPENDIX  I 


The  50  Utterances 

Used 

In 

the  Experiment 

Word  # 

Utterance 

Word 

£ 

Utterance 

0 

GRID 

r, 

c. 

FIRE 

1 

LAUNCH 

26 

TIME 

2 

COURSE 

27 

MAP 

1 

GOLF 

28 

SCOPE 

a 

SPEED 

29 

MAINE 

MESSAGE 

30 

NEUTRAL 

r 

ORDERS 

31 

REFUEL 

PLATFORM 

32 

WHISKEY 

- 

SEN SC R 

3  3 

LIMA 

- 

MISSILE 

3^ 

LOGOUT 

1C 

SATELLITE 

35 

TRACK  UNKNOWN 

1  1 

NEGATIVE 

36 

LONGITUDE 

1 2 

SUBMARINE 

37 

TORPEDO 

13 

ENEMY 

38 

BLUE  FORCE  ONE 

i  a 

EXECUTE 

39 

ROMEO 

1  V 

SAN  FRANCISCO 

40 

FLIGHT  CONTROLLER 

16 

HUMAN  FACTORS 

41 

SEA  OF  JAPAN 

17 

UNITED  STATES 

42 

HONOLULU 

18 

CLOSE  OUT  CHARLIE 

43 

ADVANTAGES 

19 

COLORADO 

4  4 

CONTINUOUS 

20 

CONNECT  TO  CHARLIE 

45 

TASK  FORCE  COMMANDER 

2 1 

NORTH  ATLANTIC  MAP 

4  6 

NORTH  CAROLINA 

22 

COMMAND  AND  CONTROL 

47 

BEARING  AND  DISTANCE 

2  3 

CONTINUOUS  SPEECH 

48 

PLOT  ALL  SUBMARINES 

24 

VOICE  TECHNOLOGY 

49 

UNITED  AIR  LINES 

26 
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turn  on  machines 

load  *  remove  T600  tape 

record  subjects  name,  etc. 


Show  subject  the 
list  of  utterances. 


Put  headset  on  subject. 


on  data  collection  sheet 


APPENDIX  II 


EXPERIMENTAL  PROTOCOL  AND  SUBJECTS'  INSTRUCTION.': 

THIS  IS  AN  EXPERIMENT  DESIGNED  TO  EVALUATE  SOME  VOF’E 
RECOGNITION  EQUIPMENT.  I  WISH  TO  EMPHASIZE  THAT  YOU 
ARE  NOT  BEING  EVALUATED  -  -  IT  IS  THE  EQUIPMENT  THAT 
IS  BEING  EVALUATED. 

THERE  ARE  TWO  DISTINCT  PHASES  TO  THIS  EXPERIMENT.  SJ 
THE  FIRST  PHASE,  YOU  WILL  TRAIN  THE  EQUIPMENT  TO  RECOGNIZE 
50  UTTERANCES  -  -  AN  UTTERANCE  BEING  A  SINGLE  WORD  OR 
SEVERAL  WORDS.  THE  TRAINING  MAY  BE  DONE  UNDER  A  BACKGROUND 
NOISE  CONDITION.  IN  THE  SECOND  PHASE  OF  THIS  EXPERIMENT , 

WE  WILL  TEST  THE  MACHINE  TO  SEE  IF  IT  RECOGNIZES  YOUR 
VOICE.  THE  TEST  WILL  BE  CONDUCTED  UNDER  THREE  DIFFERENT 
BACKGROUND  NOISE  CONDITIONS.  TO  SUMMARIZE,  WE  ARE  EVALU¬ 
ATING  THE  VOICE  RECOGNITION  EQUIPMENT  BY  HAVING  YOU  TRAIN 
IT  TO  RECOGNIZE  50  UTTERANCES.  THE  TRAINING  WILL  BE  DONE 
UNDER  ONE  BACKGROUND  NOISE  CONDITION,  AND  THE  TESTING  WILL 
BE  DONE  UNDER  THREE  BACKGROUND  NOISE  CONDITIONS. 

DURING  THE  TRAINING  PHASE,  THE  UTTERANCES  WILL  APPEAR  ONE 
AT  A  TIME  ON  THE  SCREEN.  THE  UTTERANCES  ARE  ALSO  ON  THIS 
PAPER.  YOU  WILL  BE  DIRECTED  TO  REPEAT  EACH  UTTERANCE  1 0 
TIMES.  ATTEMPT  TO  VARY  THE  WAY  YOU  PRONOUNCE  AND  GIVE 
EMPHASIS  TO  DIFFERENT  PARTS  OF  EACH  UTTERANCE.  BECAUSE 
YOU  ARE  TO  REPEAT  EACH  UTTERANCE  10  TIMES,  YOU  MAY  FIND 
IT  USEFUL  TO  COUNT  THE  REPITITIONS  ON  YOUR  FINGERS,  OR 
TO  USE  CLUSTERS  OF,  SAY,  3  UTTERANCES  TO  ALLOW  YOU  TO 
KEEP  TRACK  OF  THE  NUMBER  OF  TIMES  YOU  HAVE  MADE  AN  UTTERANCE. 

TRY  TO  KEEP  THE  MICROPHONE  IMMEDIATELY  IN  FRONT  OF  YOUR 
LIPS  AND  CLOSE  TO  YOUR  LIPS.  THERE  IS  AN  ON-OFF  SWITCH 
FOR  THE  MICROPHONE.  WHEN  YOU  ARE  NOT  TRAINING  THE  MACHINE, 
THE  SWITCH  SHOULD  BE  OFF.  REMEMBER  TO  VARY  THE  WAY  YOU 
PRONOUNCE  AND  PHRASE  THE  UTTERANCES. ' 
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APPENDIX  II 


Put  list  *o  it  is  between 
subject  &  keyboard 
operator. 


THE  50  UTTERANCES  ARE  ON  THIS  LIST.  WE’LL  SIMPLY  TRAIN 
THEM  IN  THE  ORDER  THEY  APPEAR  ON  THE  LIST.  WE'LL  CHECK 
THEM  OFF  AS  WE  GO  ALONG. 


Turn  on  noise  cassette  WE’LL  NOW  HAVE  YOU  TRAIN  THE  UTTERANCES.  (FIRST  I'LL 

if  appropriate;  adjust  dB. 

TURN  ON  SOME  BACKGROUND  NOISE.) 

Type-,  control  u/R.tuni/  TRAIN  UTTERANCES  (Test  following  the  training  of  each  word. 
Wort No  (Requires  two-out-of-three  recognition  accuracy.) 

Check  off  tbe  words.  °  J 

Turn  off  the  cassette. 

YOU  HAVE  NOW  FINISHED  THE  MOST  TIME  CONSUMING  SEGMENT  OF 
THE  EXPERIMENT.  THE  REMAINDER  OF  THE  EXPERIMENT  WILL  GO 
RATHER  QUICKLY. 

WE'LL  NOW  TEST  THE  MACHINE'S  ABILITY  TO  RECOGNIZE  YOUR 
UTTERANCES.  YOU'LL  BE  ASKED  TO  READ  OUT-LOUD  THE  50 
UTTERANCES  THREE  TIMES  -  -  EACH  TIME  UNDER  A  DIFFERENT 
BACKGROUND  NOISE  CONDITION. 

AFTER  THE  BACKGROUND  NOISE  BEGINS,  MAKE  SURE  YOU  HAVE 
THE  MICROPHONE  SWITCH  TURNED  ON,  AND  THEN  READ  THROUGH 
THE  LIST  OF  50  UTTERANCES.  PAUSE  SEVERAL  SECONDS  AFTER 
EACH  UTTERANCE,  AND  I  MAY  ASK  YOU  TO  PAUSE  EVEN  LONGER 
IF  I  GET  BEHIND  IN  RECORDING  ERRORS  MADE  BY  THE  EQUIPMENT. 

KtwllX  cajHtle. 

Tun  on  nolw  ciatttt,  II  FIRST  TEST 

appropriate;  adluat  dB. 

Type  control w/Rrtum  PLEASE  READ  THE  50  UTTERANCES. 

Record  errort/turn  caaeette  off. 


WE'LL  NOW  REPEAT  THE  PROCEDURE  UNDER  A  DIFFERENT  BACKGROUND 
NOISE  CONDITION. 


Rewind  cassette. 

Turn  on  noiae  casaette,  if 
appropriate;  adjust  dB. 
Record  errors 
Turn  cassette  off. 

Rewind  cassette. 


SECOND  TEST 

PLEASE  READ  THE  50  UTTERANCES. 
WE'RE  NOW  READY  FOR  THE  LAST  TEST. 


Turn  on  noise  cassette,  if 
appropriate;  adjust  dB. 

Record  errors/  turn  cassette  off 


THIRD  TEST 


•nd  rewind 


PLEASE  READ  THE  50  UTTERANCES. 


Recognition  Errors  by  Training  and  Testing  Conditions,  and  by  Subject 
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Wrong  Output 
Beep 

Extra  Output 
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